Simplifying Regular Expressions: A Quantitative Perspective
نویسندگان
چکیده
In this work, we consider the efficient simplification of regular expressions. We suggest a quantitative comparison of heuristics for simplifying regular expressions. We propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. We apply this normal form to determine an exact bound for the relation between the two most common size measures for regular expressions, namely alphabetic width and reverse polish notation length. Then we proceed to show that every regular expression of alphabetic with n can be converted into a nondeterministic finite automaton with ǫ-transitions of size at most 4 2 5 n+ 1, and that this bound is optimal. This provides an exact resolution of a research problem posed by Ilie and Yu, who had obtained lower and upper bounds of 4n− 1 and 9n− 1 2 , respectively [L. Ilie, S. Yu: Follow automata. Inform. Comput. 186, 2003]. For reverse polish notation length as input size measure, an optimal bound was recently determined [S. Gulan, H. Fernau: An optimal construction of finite automata from regular expressions. In: Proc. FST&TCS, 2008]. We prove that, under mild restrictions, their construction is also optimal when taking alphabetic width as input size measure.
منابع مشابه
Simplifying Regular Expressions
We consider the efficient simplification of regular expressions and suggest a quantitative comparison of heuristics for simplifying regular expressions. To this end, we propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. This allows us to determine an exact bound for the relation between the two prevalent measures...
متن کاملSimplifying Text Processing with Grammatically Aware Regular Expressions
In our paper we introduce Grammatically Aware Regular expression (GARE) and describe its usage using examples from moral consequences retrieval task. GARE is an extension to the regular expression concept that overcomes many of the difficulties with traditional regexp by adding Normalization (e.g., searching all grammatical forms with basic form of a verb or adjective is possible) or POS awaren...
متن کاملAn Introduction to the StreamQRE Language
Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. We give here an introduction to the StreamQRE language, which has recently been proposed for the purpose of simplifying the task of programming the desired logic in such stream processing applications. StreamQRE provides natura...
متن کاملPragmatic expressions in cross-linguistic perspective
This paper focuses on some pragmatic expressions that are characteristic of informal spoken English, their possible equivalents in some other languages, and their use by EFL learners from different backgrounds. These expressions, called general extenders (e.g. and stuff, or something), are shown to be different from discourse markers and to exhibit variation in form, funct...
متن کاملDerivatives of Quantitative Regular Expressions
Quantitative regular expressions (QREs) have been recently proposed as a high-level declarative language for specifying complex numerical queries over data streams in a modular way. QREs have appealing theoretical properties, and each QRE can be compiled into an efficient streaming algorithm for its evaluation. In this paper, we generalize the notion of Brzozowski derivatives for classical regu...
متن کامل